Outlier Detection and Influential Point Observation in Linear Regression Using Clustering Techniques in Financial Time Series Data
نویسندگان
چکیده
The modern computing technology makes data gathering and storage easier. This creates new range of problems and challenges for data analysis. Detection of outliers in time series data has gained much attention in recent years. We present a new approach based on clustering techniques for outlier. The Expectation Maximization clusters (EM-Cluster) a l g o r i t h m is used to find the “optimal” parameters of the distributions that maximize the likelihood function. Regression based outlier technique is used to detect influence point. The analysis of outliers and influential points is an important step of the regression diagnostics. Several indicators are used for identifying and analyzing outliers. The proposed approach gave effective results within optimum time and space when applied to synthetic data set. This paper investigates the outliers, volatility clustering and risk-return trade-off in the Indian stock markets NSE Nifty and BSE SENSEX. Engle's ARCH Test and AR (1)-EGARCH (p, q)-in-Mean model were employed to examine the objective of the study. It revealed that volatility is persistent and there is leverage effect in the Indian stock markets.
منابع مشابه
Outliers Detection for Regression using K-Means and Expected Maximization Methods in Time Series Data
The evolution of computing technology and the ever increasing size and variety of data sets have created a new range of problems and challenges for data analysts, as well as new opportunities for intelligent systems in data analysis. This study concentrates on performing experimental analysis to find regression base outlier and influential point using two standard algorithms for data clustering...
متن کاملIdentification of outliers types in multivariate time series using genetic algorithm
Multivariate time series data, often, modeled using vector autoregressive moving average (VARMA) model. But presence of outliers can violates the stationary assumption and may lead to wrong modeling, biased estimation of parameters and inaccurate prediction. Thus, detection of these points and how to deal properly with them, especially in relation to modeling and parameter estimation of VARMA m...
متن کاملChoosing the Best Hierarchical Clustering Technique Based on Principal Components Analysis for Suspended Sediment Load Estimation
1- INTRODUCTION The assessment of watershed sediment load is necessary for controling soil erosion and reducing the potential of sediment production. Different estimates of sediment amounts along with the lack of long-term measurements limits the accessibility to reliable data series of erosion rate and sediment yield. Therefore, the observed data of suspended sediment load could be used to ...
متن کاملDetection of Outliers and Influential Observations in Linear Ridge Measurement Error Models with Stochastic Linear Restrictions
The aim of this paper is to propose some diagnostic methods in linear ridge measurement error models with stochastic linear restrictions using the corrected likelihood. Based on the bias-corrected estimation of model parameters, diagnostic measures are developed to identify outlying and influential observations. In addition, we derive the corrected score test statistic for outliers detection ba...
متن کاملDiagnostic Measures in Ridge Regression Model with AR(1) Errors under the Stochastic Linear Restrictions
Outliers and influential observations have important effects on the regression analysis. The goal of this paper is to extend the mean-shift model for detecting outliers in case of ridge regression model in the presence of stochastic linear restrictions when the error terms follow by an autoregressive AR(1) process. Furthermore, extensions of measures for diagnosing influential observations are ...
متن کامل